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Abstract 


Optimal data compression by polynomial fitting is discussed from a rigorous 
theoretical point of view. By means of information gained from smoothness proper- 
ties of the functions based on the constructive theory of functions, it is shown how 
estimates can be improved over the usual minimum-variance unbiased estimates. 
A conclusion is that measurements should be made as often as possible. 

The theory developed here extends beyond the applications given to polynomial 
fitting. Proof is given for various theorems concerning biased estimators. One 
theorem, given on the result of using an approximation to the covariance matrix of 
the errors of the observations, is believed to be new. A similar technique might 
be used in estimating any set of parameters whose magnitude decreased to zero in 
a reasonable way. 
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The Enhancement of Data by Data Compression 
Using Polynomial Fitting 


I. Introduction 

One method of data compression used in orbit deter- 
mination consists of fitting many observations of a given 
data type in a fixed time interval by a polynomial involv- 
ing fewer parameters. Then instead of dealing with the 
many observations, the parameters of the polynomial 
are used. 

It is commonly assumed that such data compression 
must necessarily degrade the orbit determination. Here it 
is shown that the usual methods of fitting polynomials, 
which have a bad theoretical foundation, may lead to 
decreased accuracies, but optimal methods should signifi- 
cantly increase the accuracies obtained by the usual un- 
biased minimum- variance formulas. 

The first problem with which we will concern ourselves 
is that of polynomial fitting for the general case when the 
function to be fitted is not an exact polynomial of finite 
degree. Using the results of approximation theory, we 
show how one can take advantage of the smoothness of 
the function to be approximated in an optimal way by 
obtaining, in a sense, a priori values for the polynomial 
coefficients. 


The result is an estimate of the orbital parameters that 
may be slightly biased, but has better accuracy than that 
obtained from the minimum-variance unbiased estimate 
without data compression. Also one need not choose a 
unique degree for the approximating polynomial. With 
the method used in this Report, the same result obtains 
if the degree is chosen large enough. This means that one 
does not have to apply the usual F test, which usually 
cannot be theoretically justified. 

Various theorems are proved about biased estimators. 
One theorem, which the author believes to be new, con- 
cerns bounds for errors introduced by using a wrong 
noise method. 


II. Approximation of Functions by Polynomials 

Engineers know that any ‘smooth” function f(x), 
a^x^b, can be represented by a polynomial with arbi- 
trary accuracy in a given interval, so they often use the 
following simplistic method for obtaining the desired 
approximation. They choose an integer n, calculate a step 
size A x ~ (b — a)/n , and evaluate f(x) at x = a, a + Ax , 
a + 2 Ax, • * • a + n Ax = b. A polynomial of degree n, 
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p n (x), is fitted to these points. If n is chosen high enough, 
then the actual result is supposed to be a polynomial 
p m (x) of degree m with m < n. This is not the case, how- 
ever. Such a procedure may “blow up” [p n (x) may not 
converge to f(x) in any sense for increasing n] even for 
functions f(x) that are infinitely differentiable. If the 
reader cares to verify this he can try equally spaced inter- 
polation on f(x) ~ 1/(1 + X-) for the interval [ — 5,5]. No 
matter how accurately he will do the interpolation, the 
error in the representation max x |/(x) — p n (x) | [ — 5^x— 5] 
will increase beyond all bounds as n becomes large. 

An introductory account of the constructive theory of 
functions, which deals with problems of approximation by 
polynomials, is given by Todd (Ref. 1). A more complete 
account, which covers the material we will use, is given 
by Lorentz (Ref. 2). 

Some classical results that we will use are: 

1. Let H n be the set of all polynomials whose degree 
does not exceed n, and p n (x) a member of H n . For con- 
tinuous f (x), a^x^b y define 

E n = inf max \f(x) — p n (x)\ 

PnC H n a 

Then there exists a unique member of H n , ~p n (x) such that 
E n — max J .|/(x) — p n (x)\; jp»(x) is called the polynomial 
of best approximation (in the sense of the uniform norm). 

2. lim E n = 0 for all continuous functions. 

n -» oo 

3. The manner in which E n approaches zero depends 
on the differentiability of f(x). The set of real functions 
having k continuous derivatives on [a, b] is denoted by C k . 
The greater the value of k y the faster E n will approach zero 
with increasing n. In fact, one can deduce the value of k 
from the behavior of E n as a function of n. 

4. It does not make sense (in a practical sort of way) 
to try to accurately approximate a function with a poly- 
nomial unless it is many times differentiable, or better still, 
infinitely differentiable. 

5. The polynomial of best approximation p n (x) cannot 
be obtained in general by linear methods. There is no 
way of finding J) n (x) by evaluating f (x) at a predeter- 
mined number of values of x and calculating coefficients 
of the polynomial by linear relations. The algorithms for 
determining ~p n (x) are nonlinear and are usually time- 
consuming. 

6. Usually acceptable polynomial approximations can 
be obtained by truncating a Chebyshev series expansion. 


Let f (x) be defined for — 1 x ^ 1 and satisfy a Dini- 
Lipschitz condition, or 


lim [a> (8) In 8] ~ 0 (1) 

<5 — > o 


where a> (8) is the modulus of continuity. Let T„ (x) be the 
Chebyshev polynomial of the first kind of degree n. Then 
if f(x) satisfies the Dini-Lipschitz condition, f(x) can be 
represented by a uniformly convergent Chebyshev poly- 
nomial series 


f(x)= 2 a n T n (x), 


■ 1 — x — 1 


(2) 


where 


1 r 

a o = - / 

-H 


fix) 


-i VI — X 
2 f'f(x)T n (x) 


dx 

dxy 


n = 1,2, 


VI “ x ~ 

The Chebyshev polynomials are defined by 
T n (x) = cos (n arc cos x) 


(3) 


(4) 


The condition (1) is satisfied if / (x) satisfies a Lipschitz 
condition, or if f (x) is bounded on [ — 1, 1]. 


For well-behaved functions the truncated expansion of 
Eq. (2) is a good approximation and normally has a maxi- 
mum absolute error not much greater than for"p n (x). The 
drawback of this expansion is that it is usually not possible 
to express the a n in terms of known functions. However, 
approximate numerical methods for obtaining the a n are 
available. 


7. The coefficients a n for n = 0, 1, 2, * * * N are given 
approximately by 

2 nl 

/(*7)r n (*?), 

where m is an integer >N and x™, / = 1,2, • • m, are 

the m zeros of T m (x). The roots are given by 



xj = cos — - 


7T (2/ - 1) 


m 


k = 1 , 2 , 


m 


( 6 ) 


When m is taken to be N + 1, then Eq. (5) gives us the 
coefficients of the interpolated Nth-degree polynomial 
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determined by the N + 1 roots of T N+l ( x ). It can be shown 
that for fixed N the approximate values of Eq. (5) will con- 
verge to the exact values given by the integrals in Eq. (3) 
as m — > oo for any continuous function f(x ). 

8. To establish this result we interpret Eq. (5) as the 
approximation of Eq. (3), using Gaussian quadrature with 
a weight function (1 — x 2 ) Here the form of the approxi- 
mation is 


r - g(x) dx^-y. gTcos 

J- 1 \/l — X 2 m J = 1 L 


7T (2/ ~ 1) ~ 

2 m 


(7) 


We next use a theorem by Stieltjes, which establishes that 
the general Gaussian quadrature scheme (increasing m ) 
for any weight function on a finite interval is convergent 
for any continuous function g (x). In evaluating Chebyshev 
series and the special series in Eq. (5), various tricks can 
be used (discussed in the Appendix). The important point 
is that Chebyshev series can be accurately and quickly 
evaluated. 


before we really know how its coefficients approach zero. 
Let us illustrate with a few examples. We give in Table 1 
the Chebyshev coefficients for the expansions of e? and 
in the interval [0,1], Of course the variable t in T„ ( t ) 
has to be related to x by a linear transformation so that 
t ~ 1 x — 0 and t = l«-»x = 1. 


Table 1 . Chebyshev polynomial expansions for e x and 
e“* on interval [0, 1 ] giving coefficients a n of 

2 aJn (f), t = 2x - 1 


n 

an for e x 

o n for e' r 

0 

1.753387654 

0.645035270 

1 

0.850391654 

— 0.312841606 

2 

0.105208694 

0.038704116 

3 

0.008722105 

-0.003208683 

4 

0.000543437 

0.000199919 

5 

0.000027115 

— 0.000009975 

6 

0.000001128 

0.000000415 

7 

0.000000040 

—0.000000015 

8 

0.000000001 

0.000000000 


We thus have an efficient method for determining a 
polynomial that uniformly approximates a function (which 
we assume to be at least many times differentiable) to any 
desired accuracy. We note that the coefficients are linear 
combinations of certain values of f (x). 

III. The Convergence of Chebyshev Series 

The central idea of this Report is that in orbit- 
determination problems, at least, we can do much better 
than determine the degree of a polynomial by the usual 
F test. Without discussing this standard method in detail, 
we seek to determine whether the improvement in fit that 
results from adding one or more parameters, or making the 
degree of the polynomial higher, is statistically significant. 


Note how the magnitude of the coefficients decreases 
in size approximately according to Kp n , where K and p are 
constants and p & 0.1. In fact, for e*, 0 ^x 1, we have 
for all n 

| a n | ^75.59 (0.0514)" 

For e x , 0 — x — 1, we have for all n 

| a n | ^31.68 (0.0502)” 

The fact that we are able to do this, and the fact that 
these relations hold for all a n (which we do not show 
here) is no accident. 


The drawbacks of this method are many. One is that 
there is usually no polynomial of finite degree that will fit 
the data exactly even if no noise were present. Next, one 
does not know what significance level to use for the test. 
Also, one must assume that the noise is Gaussian. In 
addition, the test is difficult to apply when the noise is 
correlated. 

The most serious objection to this method is that it does 
not consider any a priori knowledge about the nature of 
the function to be approximated. If we know that there 
exists a polynomial, and its approximate degree, that will 
fit the data “satisfactorily” we really know quite a bit 
about the function to be approximated. As intimated 


We next define analyticity on the line for functions. 
A function f(x) defined on [a, b] is said to be analytic on 
the interval if for an x 0 € [a, b] there is a power series 

2 C» (xo) (x - x 0 ) i 

i = 0 

convergent for | x — x 0 1 = R, which represents the func- 
tion at all points belonging simultaneously to [ a , b] and 
(x 0 — R,x o + R). We denote by A [a, b] the class of func- 
tions analytic on segment [a, b]. If R = oo, the function is 
said to be an entire function. Then we have a result ob- 
tained by Bernstein (see Bernstein, Ref. 3, p. 110, or 
Lorentz, Ref. 2, p. 77, for the proof). 
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Theorem 1 

Let f (x) be continuous on [a, b] and E n — E n ( f ) be the 
best approximation to f(x) by polynomials belonging 
to H n . 

In this case f(x)e A [ a , b] if and only if 

En < Kp n (8) 

where K and p < 1 are constants. 

Moreover, f (x) is an entire function if and only if 

lim (E n )v» - 0 (9) 

n -> oo 

We restate the theorem more precisely in the termi- 
nology of the theory of functions of a complex variable. 


Theorem 2 

A. If the best approximation of the function 

f(x)eC [a, b] [f (x) is continuous] 
is such that 

r l J /n l 

lim|^(/)J R > 1 (10) 

then f(x) is analytic inside an ellipse, the foci of 
which are the points a and b , the semiaxes having 
the sum [(b — a)/ 2] R, with a singular point lying 
on its boundary. 

B. Conversely, if f (x) is analytic inside an ellipse whose 
foci are a and b , and a singular point lies on the 
ellipse boundary, then the best approximation by 
polynomials on [ a , b] satisfies Eq. (10), [( b — a)/2] R 
being the sum of both semiaxes of the ellipse. 

Thus p ^ 1/R, which in turn is determined by the location 
of the singularities of the function. For entire functions 
we can find values of K and p that satisfy the inequality (8) 
with p arbitrarily small. 

These results carry over in the convergence of coeffi- 
cients of Chebyshev series. To establish this we need some 
preliminary results. For simplicity assume that [a y b] has 
been linearly transformed into [ — 1,1]. We introduce a 
standard mapping in the complex plane 

< u > 


where z = x 4- iy and tv — Re'*. The circle in the to-plane 
with R constant >1 maps into the ellipse E R , (x 2 /c 2 ) 
+ (y~/d 2 ) — 1, with the semiaxes c — (1/2) [R + (1/R)] 
and cl = (1/2) [R — (1/R)], and the foci ±1. For fl = 1, 
E r is the interval [ — 1, 1] covered twice. For each point z 
not on [ — 1, 1] there is exactly one ellipse E R , R > 1, pass- 
ing through z y which can be found from the equation 


R = |z± Vz 2 -l| (12) 

where the sign for the square root is chosen to make the 
absolute value of the expression >1. We next have a 
theorem derived by Bernstein (Ref. 3, p. 112) on the 
growth of polynomials in the complex plane. 


Theorem 3 

If a polynomial P n with complex coefficients satisfies 
| P n (x)|^Mfor 1 ~ x — 1, then 

\P n (z)\^MR\ zeE R> R > 1 (13) 


Proof . Let 



Qsn {tV) 
v n ~ 


where Q 2n (u) is a 2nth-degree polynomial in v. For | v | = 1, 
using the property of the mapping (11), 


Q 2 n (tV) 

V n 


Pn(x) 




Thus 


Q 2n (tv) — M 


(14) 


From the maximum principle the inequality (14) holds for 
| w | — 1. Let | tv | — 1/R, so that z — (1/2) [w + (1/tv)] 
lies on the ellipse E*. Then 


Pn(z) 


<?2n M 

tv n 


MR n 


which was to be proved. 

We now can prove the following theorem about the 
convergence of Chebyshev series in a manner that closely 
follows Theorem 2. 
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Theorem 4 

If the Chebyshev series expansion of f (x) € C [ — 1, 1] is 
such that 


IV /* l 

a n\\ H 0 >1 (15) 

then f ( x ) is analytic inside the ellipse E Rq) with a singular 
point lying on E R() . Conversely, if f(x) is analytic inside 
E Rq for R 0 > 1, then the coefficients satisfy Eq. (15). 

Proof. Let D R be the interior of the ellipse E R . We show 
first that if f is analytic in D R , 1 < R < oo, then 

\i/» i 

) -R M 

We expand f(x) on [ — 1,1] in Chebyshev polynomials 
according to Eq. (2) and (3); the fact that f(x) has a con- 
tinuous derivative assures us that the series is uniformly 
convergent. By the substitution cos t — x we obtain 


lim ( a» 

n oo \ 



large radius, and the integral with z n is taken over a circle 
with a small radius. Thus 

= 2 ^ dz + iij c g ^ Z < " +1> dZ 

Let M be the maximum value of | f | on E^. Then the abso- 
lute value of the first integral is bounded by 

h M (w)" 2 ’ RV = MR; " 

In the same way the second integral is bounded by 
MR~ n . Thus 


^ 2M ^ _ /lov 

= n = 1,2, • • • (18) 

It follows that 

(\a n \) 1/n — (2M) 1/n 1/Ri, and lim(|a n |) 1/n — 1/Ri- 


00 

f (cos t) = 2 Uk cos k t 

n = o 

1 f v 

a o=—J f (cos t) dt 

1 

a n = — / f (cos t) cos k t dt , 
irj-ir 


n = 1,2, 


The substitution z = e u in the integral for a„ gives the 
line integral along the circle | z\ = 1 ? 


Since can be taken arbitrarily close to R, we obtain 



Conversely, let the inequality (16) hold for some R, 
1 < R < oo . We will show that f has an analytic extension 
onto each elliptic disc D Rl , 1 < R 1 < R, and therefore 
onto D r . This taken with Eq. (19) will show that the larg- 
est elliptic disc in which f is analytic is D Rq , which will 
prove the theorem. 




z n + z~ n dz 


n = 1,2, 


(17) 


Choose R, with 1 < R, < R. Consider the function 

in the closed ring G bounded by the circles C x : \z \ = 
Rl\ and C 2 : \z\ = R^ From the discussion following 
Eq. (11) we see that if z is on either of the circles \z \ — a 
or \z \ = cr -1 , then = (1/2) (z + z~ x ) is on the ellipse E*. 
Since f (tv) is analytic in D R , it follows that g (z) is analytic 
in the ring G. We now change the path of integration to 
estimate the a n . We split the integral (17) into two parts; 
the integral containing Z' n is taken over a circle with a 


Let Rj < R 2 < R. Then ( | a n | )^ n ^ 1/R 2 for n^N with 
N sufficiently large. It follows that on [ — 1, 1] 


T n (&) 




n > N 


(20) 


where we use the fact that | T n (x) | ^ 1 for xc[ — 1, 1]. 
Applying Theorem 3 to expression (20), using z in D Rv 


fl ” r "( z )|-(l) n , zeDsi> n>N 

( 21 ) 

Hence the Chebyshev series expansion of f(x ), with x 
replaced by z, f (z) = 22U a n T n (z), converges uniformly 
on each compact subset of D R . Its sum is analytic on D R 
and provides the desired analytic extension of f. This com- 
pletes the proof. 
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The importance of Theorem 4 is that it shows that for 
functions analytic on the line there exist bounds for the 
Chebyshev coefficients of the form 


Table 2. Chebyshev polynomial expansion for 
In (1 + x), 0 ^ x 1 , giving coefficients 
a„ of 2 T n (t), t = 2x — 1 


\On\^Kp* 


( 22 ) 


where K and p < 1 are constants, and p is arbitrarily close 
to 1/Ro, given by Theorem 4. 


Consider for example the expansion of f (x) = 4/ (5 + 4x), 
where — l^x^l. Here there is a pole at —5/4. From 
Eq. (12), R = | —5/4 - V 9/16 | = 2. Hence we expect 


for large n. 


a n 





n 


In fact, Elliott (Ref. 4) shows that for all n 


a n 


8( — l) B 
3 



Another example that shows the regularity with which 
a n ^> 0 is given by the expansion of In (1 + r), 
presented in Table 2. 


Figure 1 shows the behavior of log a n vs n. In addition, 
log ( Kp n ) vs n is graphed, which is a straight line. Here 

r = |_3-V3*T71| = 5.8284 * * • 



Fig. 1 . Expansion of In (1 + x) = 2 °n T n (t), 
0^x<1,t = 2x — 1 


n 

— 

On 

0 

0.376 X 10 ° 

1 

0.343 X 1 C f 

2 

- 0.294 X 10- 1 

3 

0.336 X 10' 2 

4 

- 0.433 X 10 3 

5 

0.594 X 1 O ' 4 

6 

- 0.850 X 10 5 

7 

0.125 X 10 -= 

8 

- 0.187 X 10 6 

9 

0.286 X 10“ 7 

10 

— 0.442 X 10“ 8 

11 

0.689 X 10 ® 

12 

- 0.108 X 10 9 

13 

0.171 X 10 ^ 

14 

- 0.273 X 10 “ u 

15 

0.438 X 10 12 

16 

- 0.704 X 10 13 

17 

0.113 X 10- 13 

18 

— 0.184 X \0 U 

19 

0.299 X 10" 15 

20 

- 0.488 X 10 16 

21 

0.798 X 10" 17 

22 

- 0.131 X 10 17 


since the singularity (in the scale we use) occurs at Z = —3. 
The value for p used for the straight line in Fig. 1 is 0.161 
and is not exactly equal to l/R, which is 0.1715 • • • . Since 
only a finite number of terms was used, one can expect 
some discrepancy. 

This raises the question of whether or not we can be sure 
that the function we are approximating is analytic on the 
line, since in practice we do not have an analytic expres- 
sion for f(x). In many cases we can find a differential 
equation for the quantity to be approximated and infer the 
analyticity from the nature of the equation [see for in- 
stance Lefschetz (Ref. 5)]. If the function does seem poorly 
behaved in a small region, it is best to eliminate this region 
from the approximation by using a smaller interval. 

The author has generated tens of thousands of Cheby- 
shev series (Ref. 6) and can report from the many cases 
he has examined that a very simple convergence pattern 
always holds for rapidly converging series. In fact, for 
functions analytic on the line, 

\a n \ = Kp n (23) 
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often seems to hold for large n, when the function is not 
an entire function. In other words, an expression of the 
form Kp n is not merely a good bound, but often is a good 
approximation for | a n | when n is large. This may be partly 
justified from the results of Elliott (Ref. 4), where sharp 
estimates of a n are obtained. 

We shall give a simplified description of some of Elliott’s 
results and refer the reader to Ref. 4 for exact statements. 
Elliott discusses the contributions to the Chebyshev coeffi- 
cients of poles and a branch point on the real axis by evalu- 
ating a contour integral. The result is that the contribution 
to a n is expressed as residues of functions involving f(z) 
evaluated at the singularities of f(z). For large n we need 
only consider the effects of those singularities located on 
E r , where R > 1 is chosen so that there is at least one 
singularity of f (z) on E n and f (z) is analytic in the interior 
of E r . If the singular points on E R consist of a single pole 
of order k y then Elliott shows that for large n 

<-> 

where K is a constant. If k — 1 then Eq. (24) is of the 
form (23). The Schwarz principle of reflection applied 
to f (z) [f (z) is real for real z] shows us that its poles must 
occur at conjugate complex values of z. If we consider 
pairs of poles (or any other singularity) on E Ry we find 
that their individual effects are conjugate complex values 
of each other. Their effects may be characterized by a 
slowly varying term similar to Eq. (24) multiplied by a 
rapidly varying term of maximum amplitude one. 

Similar results hold for the case of a branch point on 
the real axis. Thus when there is only one singularity on 
E Ry Eq. (24) is a good approximation to \a n \, which 
behaves much like Eq. (23) for a restricted range of n. 

Our main hypothesis is that we can use the inequality 
(22) by estimating K and p from a finite number of terms. 
The effect of errors caused by taking a finite number of 
terms will be discussed in connection with a general 
algorithm for implementing this curve-fitting method. 

IV. The Use of Convergence Properties 
in Estimation 

Let us first recall the usual minimum-variance linear 
unbiased estimator (see Scheffe, Ref. 7). Let the 
n-dimensional column vector y represent the observa- 
tions, and 

y — Uq + e (25) 


where U is a known n X m matrix, q is an m-dimensional 
column vector representing the unknown elements we 
wish to estimate, and e is an n-dimensional column noise 
vector. We assume that the expectation of e is zero, 

E(e)= 0 (26) 

and assume that the covariance matrix of the noise 

E(ee T ) = A e (27) 

is known and is positive definite. Here the superscript T 
indicates transpose. If the columns of U are linearly inde- 
pendent, by the Gauss-Markoff theorem we have the un- 
biased minimum-variance estimate of q y 

<7 = (U T A " 1 Uy 1 U T A " 1 y (28) 

The covariance matrix of q is given by 

E[(q-q)(q-q) T ]=(U T A-'U)-' (29) 

We now show how to obtain “better” estimates by mak- 
ing use of the convergence properties of Chebyshev series. 
Since the new estimates may be biased, we define a “best 
estimate” to be a linear minimum second-moment estimate 
of q y *q y where the second moments are taken with respect 
to the true value of q. Since ordinarily it is the size of the 
errors in one experiment that matters, and not whether 
under some hypothetical situation the expectation of the 
errors over the hypothetical ensemble is zero, we dispense 
with the condition of no bias. Thus we seek a linear esti- 
mate q such that its second-moment matrix 

E [(<T- q) (<T- q) T ] - Ar (30) 

is a minimum in some sense. 

In connection with the minimization of matrices we 
review the following definitions. A square matrix A is 
called positive definite if x 1 Ax > 0 for all x^O. If — 0 
for all x and there exists a nonzero vector x for which the 
equality holds, A is called positive semidefinite. By defini- 
tion, a positive definite matrix is less than or equal to a 
second positive definite matrix if the second matrix minus 
the first matrix is positive semidefinite. 

We now state the mathematical problem we shall solve. 
As before we are given a model for a set of observations y, 
as in Eq. (25), with U and y known and with certain known 
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(39) 


statistics about the error vector. However, we now no By definition 
longer assume that E ( e ) = 0, but 

E(e)=e (31) 



where € is unknown. 


Again, 


E ( ee T ) — A e 


(27) 


is assumed to be positive definite. The problem is to 
choose a linear estimator q from the class of linear esti- 
mators such that A 7 =E(q — q)(q~ q) T — A-, where 7f 
is any other linear estimator. Let 


is the augmented U matrix. We consider the special case 
where the augmented A e matrix is of the form 


A' = 



(40) 


When Eq. (36) is evaluated with the new quantities, cer- 
tain simplifications result: 


q = Qy ( 32 ) 

where Q is a given matrix. Hence 

A ? = E {[Q(Uq + e) - q] [Q ( Uq + e) - qY) 

= (Q U - I) q q T (Q U - 1) T + (QU- I) e T Q T 
+ Q€(U T Q T -I) + QA e Q T (33) 

where I is the unit matrix. This results in A 7 being depen- 
dent on the unknown q and e. If we restrict our estimators 
by the condition 

QU = 1 (34) 


'c;=(A-' + U T A- e 'U)-'(Aj\urA- e ')y' (41) 

and 

A 7 =(A-' + UTA- e 'U)-' (42) 

We now have the basic mathematical machinery to 

make use of convergence properties of series. We wish to 
estimate A, the vector of coefficients for the Chebyshev 
polynomial series. Without a priori information 

y = BA + e (43) 


which previously resulted from the condition of no bias, 
then 

A 1 ^QA e Q T (35) 


where the B matrix is known in terms of the Chebyshev 
polynomials. For programming purposes it is convenient 
to have indices start at one, instead of zero. Hence, 


no longer depends on the unknown q or €. This means that ^ ^ 2 n 

the solution to the problem of finding an estimator with a — T j - 1 (ti), * — \ 2 * • L (44) 

minimum reduces to the form of the previous case, with 


q =(U t A 7 UY'U T A 7 y (36) 


where the tilde is used to remind us that E(q — q)^ 0. 
In fact,- using Eq. (34) and (36), 

E(cf — q) — ( U T A " 1 17)- 1 U A - 1 e (37) 

We next develop a special formula for the use of a priori 
information with bias. The most general formula is left as 
an exercise for the reader. We reinterpret the observation 
vector to include the a priori estimate of q, q 0 for the first 
part of y. Thus y' is partitioned as 

(38) 



where n is the number of observations of the particular 
data type, or of the curve we are fitting, L is the dimension 
of A, and r* is the transformed time (to [ — 1,1]) of the 
ith observation. For the moment let us choose L to be so 
large that the error in the representation of E (y) is neg- 
ligible. Mathematically there is no limit to how large L 
might be chosen, although if L > n we have to introduce 
some complicated algebraic notions (such as generalized 
inverses). If the interval on which the observations are 
given is denoted by [t 0 ,t f ], then 


Ti = 


2 

tf to 



to + tf 

2 


)■ 


i = 1,2, • • • n 

(45) 
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The evaluation of a Chebyshev polynomial for Eq. (44) 
may be regarded as a special case of the evaluation of a 
Chebyshev series, which is discussed in the Appendix. 


Now we assume that we know that the series is bounded 
by an expression such as (22). Then there exists an a priori 
estimate of A, which we take to be zero, and a finite 
second-moment matrix of this a priori estimate, which we 
call A a p a . 


From Eq. (5), (6), and (48), we have 


Cii L + 1 
L + 1 


L + 1 



k = i 


i = 1 

7 = 1, 2, • • • m 


Z 

k = l 


5/ (tk) 

dq. 


Ti- 1 ( Tfc ), 


< = 2 , 3 , 
7 = 1 , 2 , 



It is easily shown that the best second-moment matrix 
for the a priori estimate depends on the unknown A. The 
matrix is given by AA T , which obviously is positive semi- 
definite. Since the value of A is unknown, we might 
approximate AA T in some manner by using the available 
observations and any other a priori information. Thus we 
will have to consider the effect of using an approximate 
value for A a . p . a .. This will be done in a more general con- 
text when we consider the effect of using a wrong second- 
moment matrix of the noise. For simplicity we assume that 
the approximate A a . p . a . that is used is positive definite. If 
we assume that the true value of each component of A 
is uncorrelated with the error in any component of obser- 
vation other than A, or 


E ( 0 - ai) (e s ) = 0, 


i = 1,2, • • • L 
j = L + 1, * * * n + L 

(46) 


then we can partition A' as in Eq. (40) and obtain 


where m is the dimension of q. Here we assume that the 
dimension of q is less than that of A and the indicated 
inverses exist. We assume that f(t) is given in the form 


Also 


m 

f(t) = 2 fi ( t ) qi 

j = 1 



If we combine Eq. (49) and (47), 

q =(C r A- 1 C)- 1 C T B T A; 1 y 


(52) 


(53) 


where the largest matrix to be inverted (besides A e , which 
requires special treatment) is 

A? = (C*A?C) (54) 


A = (A-^ + BrA?B)B*A-'y (47) 


where use has been made of the fact that the a priori 
estimate of A is taken to be zero. 


For the case of orbit determination we use A as a means 
of estimating q, the orbital parameters. 


If 


then 


where 


A = Cq 

*q — [C T A" 1 C] -1 C T A” 1 A 
Aa 1 = (A-. a . +B'A?B) 


(48) 

(49) 

(50) 


instead of A " 1 as in Eq. (47) and (49). The disadvantage 
of Eq. (53) is that it deals with coordinate deviations 
directly and the data are not “compressed.” Note that in 
Eq. (53) C T B T is simply the transpose of the matrix U, 
which relates the observations to the orbital elements as 
in Eq. (25). 

We now discuss the consequences of the assumptions 
or approximations that we are making. One assumption, 
that the columns of the U matrix are linearly independent, 
can be dealt with very simply. This is discussed by Scheffe 
(Ref. 7). The result is that if the columns are in fact linearly 
dependent, then the minimum-variance estimates are no 
longer unique as far as their representation in terms of the 
parameters go. However, they are unique in terms of the 
residual quadratic form. For the orbit-determination prob- 
lem this means that if we try to determine too many astro- 
nomical constants from a set of observations of a space 
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probe, the result will be failure to determine the constants 
uniquely, but nevertheless the residuals of the observa- 
tions will be uniquely determined. Thus if one is primarily 
interested in orbit prediction for short times in the future, 
it should not matter whether or not the columns of the 
U matrix are linearly dependent. To simplify the discus- 
sion we can always eliminate some parameters to make the 
columns linearly independent. 

Our next topic is the effect of using approximations for 
A e , in particular for A a . p . a ., which forms a part of A e . A gen- 
eral discussion of the approximation of using a diagonal 
matrix for A e is given by Magness and McGuire (Ref. 8). 
A question they ask is not how much better one can do 
with a minimum-variance estimate, since this is unattain- 
able when the cross correlations are unknown, but whether 
the unknown correlations “hurt” you. Unfortunately, one 
has to go through certain transformations to find out the 
effect of the unknown correlations. 

Here we take a different approach. 

Theorem 5 

Given a model for the observations y = Uq + e, where 
the columns of U are linearly independent and E ( ee T ) = A e 
is positive definite, E ( e ) = e, which may not be zero. Sup- 
pose we construct an estimator for q according to Eq. (36), 
but use an incorrect value of A e , which we denote by r, 
and assume that r is positive definite. Then the matrix 
multiplying the observations is 

Q = (U t T- 1 U)- 1 U t ^ 1 (55) 

Let r ^ A e , or r - A e - S, where S is positive semidefinite. 
Then the actual second-moment matrix of the error in 'q 
will be less than or equal to the nominal one, (U T r _1 C7) _1 . 

Proof . We recall the result from matrix theory that 
U T AU is positive definite if A is positive definite and the 
columns of U are linearly independent. Hence (t7 r r~ 1 U) _1 
is positive definite. Also, the columns of Q T are linearly 
independent by the construction of Eq. (55). Hence the 
actual second-moment matrix, from Eq. (35), is 

Aact = QA € Q T ^Q(A e + S) (56) 

The inequality in Eq. (56) certainly holds if S is positive 
definite. If it is positive semidefinite, the inequality can be 


proved from the continuity of the product as a function 
of S. Also 

Q (Ae + S) Q T = Q V Q T = ( U T T 1 U)~ l (57) 

using Eq. (55). 

Corollary . The same result holds if A<, is positive semi- 
definite and T = S + A e , where T, the nominal-error 
second-moment matrix, is positive definite and S is posi- 
tive semidefinite. 

Equations (34) and (35) will still hold and Eq. (56) 
and (57) follow as before, which proves the corollary. 

Corollary . r [used in Eq. (55) and (57)] may be chosen 
to be a diagonal matrix. 

It is easily shown that if the elements of the diagonal 
matrix r are made large enough, then r — A e is a positive 
semidefinite matrix. In fact, if each element of V is larger 
than the largest eigenvalue of A<>, the corollary holds. 

Incidentally, it is easy to obtain a simple bound for the 
largest eigenvalue of a matrix A. For instance, it can be 
shown that the largest eigenvalue, X wmj , is bounded by 

^ max — max f 2 
; \fc = i 

where n is the dimension of A. (This is a modification of 
a theorem derived by L. Levy.) 


V. An Example 

In Table 1 we gave the Chebyshev expansions for e* 
and e x on the interval [0,1]. Suppose we are trying to 
estimate the function f(x) = q e~ x on the same interval 
[0, 1] where q is unknown. Assume that in reality q = 1. 
Assume that we have some knowledge of the size of a i? 
which was obtained from a study of comparison func- 
tions. For simplicity we will limit the dimension of A to 3, 
or assume that we adequately approximate the function 
with a polynomial of degree 3 — 1=2. We also assume 
that we are limited to three measurements at time x = 0, 
V 2 , and 1. 

If we restrict ourselves to these three time points, it can 
easily be shown that e x evaluated at these points deter- 
mines a second-degree polynomial with slightly different 
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coefficients than given by Table 1. This is obtained by 
evaluating e~ x at these three time points, 0, and 1 
(1, 0.60653, 0.36788 to five decimal places) and interpolat- 
ing. The resulting series is 


f ( T ) = 0.645235 - 0.31606 T x (r) 

+ 0.038705 T 2 (t) - 1 ^ r ^ 1 (58) 

Next, assume that the covariance matrix for A that is 
available to us is 10 times the squares of the values for a\ 
given in Table 1. The factor of 10 provides us with a 
margin of safety for our estimating method: 


shev polynomials, 



( 64 ) 


A : 1 = B T B + A " 1 

fl 1 a.p.a . 


From Eq. (58) and (51), 


3.24037 0 1 

0 3.02037 0 

1 0 68.7462 


(65) 



0 0 
0.97969 0 

0 0.01521 


(59) 


C* = (0.645235, - 0.31606, 0.038705) (66) 

Atf = C* A * 1 C = 1.8037447 (67) 

A'q ~ 0.5544022 (68) 



0 0 
1.02073 0 

0 65.7462 


Also assume that 


A e = I, E (< e ) = 0 


Finally, from Eq. (53), 


(60) 


q = (0.5544022, 0.3362614, 0.2039532) y (69) 


We note that Eq. (68) gives the estimated variance of cjf. 
The actual variance of q is obtained by using Eq. (69) and 
the statistics of y. The result is a variance of 0.462030 plus 
a bias term of 0.166661. The total second moment about 
the true value is 


These are our basic assumptions. Next, we use the usual 
unbiased minimum-variance estimate given by Eq. (28) 
with the covariance matrix given by Eq. (29). 

Here 

U T = (1, 0.60653, 0.36788) (62) 

Then the variance of the estimate given by Eq. (28) is, 
from Eq. (29), 

(U T 17) i = 0.66524 (63) 

We now redo the problem, using our a priori knowl- 
edge of the size of the coefficients, and see what we gain. 
From Eq. (44) and a knowledge of the behavior of Cheby- 


0.462030 + (0.166661) 2 - 0.48979 

which is smaller than that given by Eq. (63) (0.66524), 
using the usual formula. The reason that the actual sec- 
ond moment is smaller than the calculated one given by 
Eq. (63) is that we allowed ourselves plenty of margin for 
error in estimating the size of a n . 

VI. General Conclusion 

The method we used to improve our polynomial fitting 
can be used in the estimation of other parameters if we 
give up the condition of unbiasedness. Equation (28) is 
the best linear unbiased estimate. If we can determine 
bounds for E(qq T ), then we can employ the same pro- 
cedure and use an a priori estimate of q as zero, with the 
associated second-moment matrix. 
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Appendix 

The Evaluation of Chebyshev Series 


The evaluation of a series of orthogonal polynomials can 
be conveniently accomplished by a method described in 
Ref. 6. For a Chebyshev polynomial series the algorithm 
is as follows: 

Given the series 

p (x) = 2 a n T n ( x ) 

n = o 

Define 

b n +i b n+2 0 

Compute recursively 

b n = 2x b n+1 - b n+2 + a n , n = N, N - 1, * • • 1 
b 0 = xb 1 — b 2 + g 0 
p (x) = b 0 

These formulas may also be used in curve fitting as in 
Eq. (5) and (6). If maximum accuracy is desired for curve 
fitting, another method is recommended. This alternative 
method is most useful when n is large and the round- 
off error in x , as used in the above algorithm, may be 
significant. 

To obtain the Chebyshev coefficients in curve fitting, 
we make use of the definition of T n (x) as given by Eq. (4) 
when used in Eq. (5) and (6), which results in 


We now observe that the cosine function need only be 
evaluated in the first quadrant for the values 

7 t k . ^ , 

cos — — , k = 0 , 1 , • • • m 

Z m 

For values of n (2/ — 1) that are greater than m, the quad- 
rant L is found from 

where integer arithmetic is used for the first division. 
Then the well-known formulas for transforming the cosine 
function to a function of the first quadrant are used to 
obtain 

k 0 ~n ( 2 / — 1 ) [mod m] 

T n (xf'' = cos-^ — , L = 1 
n ’ 2 m 

T (xf) = — cos — , L = 2 
" ■* 2 m 

where k = m — fc 0 » and 

T .(*?)= - cos i„' L = 3 
T -w = ““is- L = 4 

where k = m — k 0 . 


m 

“■ “ i w T -[ cos i • 


j = l 


?'=l 


P n (2/ ~ I) - ! 

L2 m y 


n = 1,2, ■ ■ ■ N 
n = 1, 2, ■ • • N 
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